Don't let your application get haunted by database downtime. Learn how you can stay online even in the event of node, region, or even cloud outages with CockroachDB.
Survive anything.Let’s start by ripping the band-aid off: there are no ghosts, ghouls, or goblins in the story of the Halloween problem. But for anyone who works with databases, it does tell a scary story about how a simple SQL operation can haunt your database in a pretty unexpected way.
So, let’s talk about the “Halloween Problem,” and whether it’s truly as spooky as it sounds.
The Halloween Problem describes a phenomenon in which INSERT
, UPDATE
, DELETE
, and MERGE
queries in SQL can, under certain circumstances, result in rows being updated multiple times during an operation in which they should only be updated once.
The Halloween Problem was first discovered by Don Chamberlin, Patricia Selinger, and Morton Astrahan.
According to a 2001 interview with Chamberlin, it was uncovered while Selinger and Astrahan were working on query optimization. They were working with an example problem: giving 10% raises to every employee in a relational database table whose yearly salary was less than $25,000. They discovered that, if executed in a certain way, that query could end up running over and over again until every employee’s salary was at least $25,000.
Chamberlin says that Selinger and Astrahan discovered this on October 31, a Friday. Realizing that the issue was too big for them to solve before the weekend, the group decided to give it a name – the Halloween Problem, since it was discovered on Halloween – so that they could more easily refer to it in the future when they had time to address it. (Note: this discovery probably happened in 1975, but it’s not entirely certain. See our note at the end for additional details).
In short, the Halloween Problem can occur in relational databases if a write operation is executed on a table using a non-clustered index on a column that is being updated. In non-clustered indexes, the logical order of the index may not match the order in which the rows are stored physically — meaning that an update to a row can change its location. This, in turn, can move it back ahead of the index scan the query is using, resulting in the same row being updated multiple times…or possibly even in an infinite loop. Talk about scary!
We can demonstrate this using the same example Selinger and Astrahan were looking at when they uncovered it. Imagine we have a table of employees and salaries in which the first few rows look something like this:
employee | salary |
---|---|
ellen | 26500 |
parker | 13100 |
ash | 12300 |
If we wanted to update the table to give a 10% raise to anyone making under $25,000 per year, we might run a query like this:
UPDATE employee
SET salary = salary * 1.1
WHERE salary < 25000;
We would expect the result of that query to be an updated table in which Ellen’s salary does not change, but Parker’s salary is updated to $19,360 and Ash’s salary is updated to $13,530, and so on.
If, however, our query is processed using a non-clustered index, Parker and Ash’s salaries may be updated continually until they reach $25,000 or higher.
How? Imagine this is our index, a non-clustered index sorted (ascending) by the values in the salary
column:
employee | salary |
---|---|
12300 | ash |
13100 | parker |
26500 | ellen |
If our query is executed iteratively using this index, then it will look first at the row for ash
. Since this value is below 25000
, the value will be updated – as per our query, multiplied by 1.1, so the new value is 13530
. Updating that row will also update the index, so our index is now:
employee | salary |
---|---|
13100 | parker |
13530 | ash |
26500 | ellen |
As the query is proceeding iteratively, the next step will be to update the parker
row, but we can already see the problem here: our update to the ash
row also changed its position in the index. And since its value is still less than 25000
, the update will be applied to it again when the index scan reaches that point in the index.
And with these specific values, the same thing will happen with parker
’s salary, such that ash
and parker
will continually leapfrog each other in the index, remaining one step ahead of the index scan, until both values pass 25000
.
It’s important to note that this but is one version of the Halloween problem. It does have other forms, and can also impact INSERT
, DELETE
, and MERGE
statements in addition to UPDATE
statements.
Long story short: probably not. This problem, after all, was discovered by some of the core developers of relational database systems, and has been a known potential issue since the 1970s. Many modern relational database management systems thus have query optimizers that are designed to detect queries that require “Halloween protection” and execute them accordingly.
However, don’t let that prevent you from getting spooked by the Halloween problem! As mentioned, what we’ve described above is just one relatively simple example of the Halloween problem, taken from Chamberlin, Selinger, and Astrahan’s original example back in the System R days. The Halloween problem can potentially occur in other ways depending on the specifics of your DBMS. Many DBMS still have the potential for “Halloween-like” problems to occur under specific circumstances, and Halloween Protection strategies can sometimes have an impact on performance.
So, while the Halloween Problem might only have been named that because of the coincidental date on which it was discovered, the idea of it should still send a shiver down your spine… at least until you’ve checked your database’s documentation to see how it’s handled.
(If you’re a CockroachDB user, you can read a bit about the investigation of this problem in the context of our distributed SQL database here, and see how that issue was resolved here).
When was the original Halloween Problem discovered? Probably 1975, but the issue is clouded somewhat by the fallibility of (human) memory.
To be clear, Chamberlin and Selinger have both said the problem was discovered in 1976 or 1977. Wikipedia (and a plethora of other online sources citing it) also say it was 1976.
However, Chamberlin’s explanation for the name hinges on the problem having been discovered on a Friday. Halloween did not fall on a Friday in 1976 or 1977. So, Chamberlin must either be misremembering the reason for the name or misremembering the year. The latter seems more likely.
Halloween did fall on a Friday in 1975, and that was the only Friday Halloween of the 1970s, so if Chamberlin is remembering the reason for the naming correctly, 1975 seems to be the most likely candidate.
In this article, we’ll take a look at how to safely add and drop columns from a SQL database using ALTER TABLE … ADD …
This post was originally published in 2018 by former CockroachDB engineer Matt Jibson, who owns goats and makes his own …
Read moreMost SQL content on the web seems to be written with data analysts in mind. And that’s fine, but developers need …
Read more